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1.  Introduction 


; ■ 


Previous  work  on  linear  regression  models  where  the  parameter 


space  is  restricted  by  linear  inequalities  has  concentrated  on 
inference  from  the  sampling  point  of  view.  For  this  model  the  least 
squares  estimate  is  a solution  to  a quadratic  programming  problem  [10]. 
This  estimate  has  a mixed  type  sampling  distribution  (i.e.,  partly 
continuous  and  partly  discrete)  that  is  difficult  to  handle  analytically, 
even  assuming  normal  errors.  The  standard  test  for  significance  of 
coefficients  based  on  the  Student-t  distribution  can  be  very  misleading 
[12].  Even  moments  of  the  estimate  are  difficult  to  derive  [10, 16] . 

Although  much  work  has  been  done  on  Bayesian  analysis  of  regression 
models,  nothing  has  appeared  when  the  parameter  space  is  restricted. 

This  paper  gives  an  analysis  using  a natural  conjugate  prior  of  the 
nixed  type.  Emphasis  is  placed  on  determining  posterior  probabilities 
that  constraints  are  binding  ana  on  determining  posterior  distributions 
of  the  parameters.  An  analysis  is  also  given  for  a vague  prior  of  the 
mixed  type. 

The  basic  model  is  presented  in  section  2 and  examples  are  given 
in  section  h . The  likelihood,  the  prior,  and  the  posterior  are  discussed 
in  sections  4,  5,  and  6 respectively.  Section  7 applies  the  theory 
to  analyze  temperatures  of  a chemical  reaction. 


2.  The  Basic  Model 


The  observations  are  assumed  to  follow  the  standard  linear  model 


y = X8  + e 


2. 


' 


where  y'  = (y^,*.*,yn)*  P = (01>  • • • »Pp) » x=  ( xi  •••  xp)>  and 

.1 

e — N(0,T  I).  The  parameter  space 


n = £ e : CB  > f)  (2.2) 

is  assumed  to  be  nonempty,  where  C is  an  r x p matrix. 

It  will  prove  convenient  in  the  sequel  to  separate  the  un- 
restricted from  the  restricted  parameters.  Thus,  we  assume  that  the 
last  p - Pq  columns  of  C are  all  zeros  so  that  C can  be  written 

C = (C0  0) 


where  0 is  an  rx(p-pQ)  dimensional  matrix  of  zeros  and 

= ( d^  ...  dr) . If  the  parameter  0 is  partitioned  into  restricted 
and  unrestricted  parameters,  0*  = (3^,  Bg)  with  8^  = (B^.-^Bp  ) 
and  X is  similarly  partitioned  into  (X^X^)  then  the  model  (2.1) 
can  be  written  y = X^B^  + ^2^2  + e 0 = ^ * ^he 

livelihood  for  the  model  can  be  written 

1(0, T|y)  *Tn/2  exp{-  T (y-XB)'(y-XB)/2)  1(0)  (2. 3) 


where  I is  the  indicator  function. 

We  assume  throughout  that  pQ  > 0 since  otherwise  there  are  no 
restricted  variables,  and  the  standard  Bayesian  analysis  applies 
[e.g.  4,  section  11.10]. 

The  constraint  i (l£i<r)  is  said  to  be  binding  if 
d^B1  = f ^ where  ('  = (f1,...,fr).  In  certain  applications  it  is  of 
interest  to  determine  which  constraints  are  binding.  Since  a Bayesian 
approach  to  the  problem  is  adopted  here,  we  compute  the  posterior 


j 


3. 


probabilities  of  the  various  possible  sets  of  binding  constraints. 

Throughout  this  paper  we  will  assume  that  CQ  is  of  full 
rank.,  that  s = {i^,...,!^}  is  a set  of  non-binding  constraints, 
and  that  s = l , irJ  is  a set  of  binding  constraints. 

We  can  write  the  binding  constraint  equations  as 


C-B,  = f- 
s 1 s 


(2.4) 


with 


, ' — ( d . , . . . , d . ) 

' t 4u  , >1  -4  ~ 


and  f 


; = (fi. 


-fi  ) 


If  r-kj>l,  the  set  of  B satisfying  (2.4)  is  a hyperplane  of 
dimension  smaller  than  p.  Thus,  if  the  posterior  distribution  on 
3 is  absolutely  continuous  with  respect  to  p dimensional  Lebesgue 
measure,  the  posterior  probability  of  the  constraint  s being  binding 
is  0.  To  alleviate  this  problem,  the  prior  distribution  will  be 
chosen  of  the  mixed  type  with  positive  probability  on  these  singular 
subsets.  Dickey  [5]  gives  a bibliography  of  previous  uses  of  priors 
of  the  mixed  type. 


3.  Examples 


3-1  Time  Series  Modelling 


The  autoregressive  process  of  order  p (AR(p)) 


yt  - ? Biyt-i  + 6P+i  + et 


t = 1, . . . ,n 


A 

with  yQ,  • • • »yi.p  considered  as  non-stochastic  and  e^  iid-^(0,T~  ) 
can  be  written  in  the  form  (2.1)  with  B#  = (P1, . . . ,Bp+1)  and 


4. 


Vn-1  yn-2’  • * yn-p  1 / 


* 


We  will  consider  the 
The  values  of 
explosive.  In  fact, 
Ci  is  given  by  (2.2) 


case  p = 2 (i.e.,  AR(2)). 

(0  ^ , 0 Q ) determine  if  the 
the  series  is  explosive  if 


with 


C = 


time  series  is 
3 £ fi  where 
and 


f'  = If  one  is  willing  to  assume  a priori  that  the 

time  series  is  not  explosive,  the  assumptions  of  section  2 are 

satisfied.  Even  though  the  matrix  X is  stochastic,  the  likeli- 

* 

hood  can  be  written  in  the  form  (2.3)  • Since  the  theory  of 
sections  4-6  is  still  valid  for  stochastic  X matrices,  the  non- 
explosive AR(2)  is  an  example  of  the  model  introduced  in  section  2. 

Zellner  [15,  section  7.3]  computes  numerically  the  posterior 
probability  of  the  series  being  explosive  assuming  3^  = 0 and  using 
the  vague  prior 


p(31,02,T)  “ t 


He  notes  that  a similar  analysis  could  be  performed  using  an 
informative  prior,  such  as  the  conjugate  normal/gamma.  Using  any 
prior  for  (3^,02)  that  is  absolutely  continuous  with  respect  to 
two  dimensional  Lebesque  measure,  the  posterior  probability  of  any 
of  the  constraints  being  binaing  is  0.  This  seems  unfortunate 


5. 

since  Box  and  Jenkins  [2]  demonstrate  empirically  that  models 

with  binding  constraints  are  useful  in  modelling  and  forecasting 

both  economic  and  physical  systems.  In  the  present  problem 

I = [1}  corresponds  to  a difference  of  order  1 and  s = [1,3) 

corresponds  to  a difference  of  oraer  2,  where  the  difference 

1C  "I 

of  the  series  y^  is  given  by  z^.  = (1-B)  y^  and  B,Jyt  = yt  y 
For  example,  5 = [1}  implies  0^+0^  = ^ so  that  the  AR(2)  can  be 
written 

zt  " (1_ei)zt-l  = 03  + et 

where  zt  = (1-B)yt.  Thus  the  first  difference  follows  an  AR(1) 
model. 

Box  and  Jenkins  choose  between  competing  models  by  sampling 
theoretic  criteria  such  as  goodness  of  fit  tests  and  residual  sum 
of  squares.  From  the  Bayesian  viewpoint  model  selection  can  be 
accomplished  by  the  computation  of  posterior  probabilities.  The 
forecast  functions  of  the  various  models  can  be  combined  in  the 
standard  method.  Namely,  the  predictive  density  of  the  future 
observations  y is  given  by 

p(y|y)  * 2 p( s | y )p(y | y, s) 
s 

where  p(y|y,s)  is  the  predictive  density  of  y given  constraints 
s are  non-binding  and  p(s|y)  is  the  posterior  probability  of 
constraints  s being  non-binding. 

This  example  is  continued  In  section  7 where  series  C 
from  [2]  is  studied. 
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3 . 2 Polynomial  Regression 


Stevens  [14]  derives  the  model 


Aw/(l-x)  = 2 3 ,x^ 

j=l  J 


(3.1) 


where  Aw  is  the  difference  in  weight  concentrations  of  a solute 
and  x is  a dimensionless  constant  which  can  be  considered  known 
[14, equation  2).  The  parameter  space  is  (2.2)  with 


C = (c^j)  a P x p matrix  with  c 


f 1 i=J 

i,  = \ -1  j-i 

d (_0  oth 


J-i+1 

otherwi se 


and  f =0  [14,  equation  4],  If  the  i1"  constraint  is  binding 
( 0 i = 3i+1),  a polymer  of  1 times  the  basic  molecular  weight 
is  not  present  in  the  particular  molecule. 

Based  on  equation  (3*1)  either  of  the  following  models  may 
be  appropriate 


p i 1 

(a)  y i = Awi/(l-xi)  = 2 0 ~ + ei 


(b)  = Awt  = 2 (1  - xt)  + et 

J — 4 

where  e ~ 7?(0, t I).  These  models  are  both  of  the  form  (2.1), 
and  model  (a)  is  the  standard  polynomial  regression  model  with 
restricted  parameter  space. 

The  choice  of  the  order  p of  the  polynomial  regression 
is  also  of  interest  in  this  problem.  Halpern  [6]  discussed  choice 
of  the  order  in  the  unrestricted  polynomial  regression  using  the 
natural  normal/gamma  prior.  The  analogous  method  could  be  used  in 
the  restricted  case. 
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3.3  Transition  Probability  Estimation 

judge  and  Takayama  [10,  pp.  176-8]  use  quadratic  programming 
to  obtain  the  least  squares  estimate  of  the  transition  probabilities 
of  a finite  Markov  Chain.  We  consider  the  chain  with  3 states  and 
demonstrate  how  the  theory  of  this  paper  applies.  Judge  and 
Takayama  show  the  equations 


y 


pij  + v f°r 


i < j < 2,  1 < t <n 


can  be  written  in  the  form 


y = X p + u with 


y' 


y<  = 


(y^>  y'2,  y'^)>  p'  = (pi»  $2’  p3^  u<  = (ui*  u2’  * 

(yJ,l* ' • ‘ ,yj,n)»  Pj  = ^plj,p2J,p3j^  Uj  = ^Ujl’  ’ ‘ ‘ ,Ujn)’ 


X1  0 J where  X±  = (z±  zg  z^)  and 


z3  * (y3,0’'"’yJ.n-l)' 


Since  pi+p2  + p^  = l,  one  can  eliminate  p^  and  write  these 
equations  as  (2.1)  with 


fh 


\ 


y‘  = (y;,  y'2>  y^  x = 


xi  / 
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3 ' = (p^p^)*  and  e = u.  The  inequality  restrictions  are  of  the 
form  (2.2)  with 


where  is  the  k x k identity  matrix  and  1^  is  a k vector  of 

ones. 

3*  ^ One  Way  Layout 

Consider  the  1 way  fixed  effect  ANOVA  model 


yij  = + eiJ  1-1,... fp  and  j=l,...,ni 

with  3^  fixed  and  e^.  i.i.d.  — 7?(0,t~  ).  We  assume  a priori 

that  the  means  are  non-decreasing  (i.e.,  < &2 — * ' * — ®p) • This 

constraint  can  be  written  in  the  form  (2.2)  with  C = (c.  .)  a (p-1)  xp 

^-0 

dimensional  matrix  with 


'ij 


J-i+1 

i=J 

Otherwise 


and  f=0.  An  example  of  the  use  of 


this  model  is  given  in  [1,  example  1.3] • 

Other  linear  restrictions  can  be  handled  similarly.  In  the 


case  of  monotone  decreasing  parameters,  Lindley  [11]  discussed  the 
posterior  distribution  of  the  parameters  assuming  a vague  prior. 
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4.  The  Likelihood 

The  following  theorem  expresses  the  likelihood  in  a form 
which  suggests  the  natural  conjugate  prior  of  the  mixed  type  to  be 
employed.  It  shows  that  the  likelihood  for  3 e i can  be  specified 
by  a = p-r+k  parameters  which,  in  general,  can  be  picked  in  different 
ways. 

Theorem  1 

For  any  set  s of  nonbinding  constraints  there  exists  a 
subset  [j1,.*.,ja]  of  {1,2, . . . , p]  such  that  if  = gj  and 
Y ' = ( • • • , ru)  then  the  likelihood  (2. 3)  can  be  expressed  as 

*(  S,  y,r  I y)  OB  Tn,/2  exp{  -t  ( z-Ry ) ' ( z-Ry  )/2}  i(n).  (4.1) 

The  event  Q can  be  expressed  as  the  set  of  linear  inequalities 

n = (Dy2  > w)  (4.2) 

where  D is  a kxa2  dimensional  matrix,  y'  = = 

dimension  Yj_  for  i=l  and  2,  a1  = p - pQ,  and  a2  = P0  + k-  r. 

If  either  of  the  dimensions  of  D is  0,  1(0)  =1.  The  'n  dimensional 

vector  z is  a function  of  X and  y. 

To  be  precise  the  quantities  z,  R,  D,  w,  and  y should  be 
subscripted  to  denote  which  of  the  possible  sets  of  size  « has  been 
picked.  We  assume  throughout  that  a rule  has  been  established  for 
picking  the  variables  so  that  the  subscript  can  be  surpressed  without 
confusion. 


1 


Proof : Since  the  matrix  CQ  was  assumed  of  full  rank,  the 
number  of  binding  constraints  satisfies  0 < r-k<pQ.  It  is  con- 
venient to  separate  the  proof  into  the  following  three  cases. 

Case  1 r-k=0  (no  binding  constraints) 

Since  r > 1 and  pQ  > 1 by  assumption,  one  has  min{a2,k}>_l 
in  this  case.  The  proof  follows  by  defining  z=y,  R=(X1  X2), 

7^  = 3^  for  i=l  and  2,  D = C,  and  w = f. 

Case  2 1 < r-k  < pQ 

We  can  pick  columns  E = of  C-  defined 

in  (2.4),  so  that  the  resulting  matrix  is  nonsingular.  Let  E be 
the  complement  of  E (in  {1, 2, . . . ,Pq} ) and  define  matrices 


C11 ( C21 ) 

as  columns 

E 

of 

C-(C  ) and  similarly 
s s 

^12 ( ^22  ) 

as  columns 

E 

of 

CS<Cs> 

where  C'„  = (d^  . . . d^  ) . If  k=0,  C21  and  C22  are  degenerate. 

Defining  vectors  3 A = (3  * > • • • ,3  ),  yA  = (3-.  , • • • ,3  • ) 

^ J r-k  Jr-k+l  JPC 

and  f'  = (f,  , . . . , f , ),  for  k>0  the  constraints  can  be  written 

s H xk 

Cll03  + C12Y2  = fI 
C21p5  + C22y2  > fs ' 

Using  the  above  equations  to  eliminate  0^  one  obtains 

35  = c1±  (r-  - c12y2). 
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The  nonbinding  constraints  can  be  written  as  Dy2  > w where 

D = C22  " C21C11C12  and  W = :s  ' C21Cllfs  ' 

Now  define  X^(X^)  as  columns  E(E)  of  X so  that 

X0  = | Xi0i  = X202  + X5Cll^i  ~ ^12y2^  + X4Y2  = Ry  + X3Cllfi 

with  R = (X2  X4-X3C"^C12)  and  y'  = (d2,y2).  Thus  y - X3  = z - Ry 
with  z = y - X3Cxlf g.  The  rest  of  the  proof  follows  easily  in  this  case. 
For  k = 0 the  proof  is  analogous. 

Case  3 r-k = p0  (restricted  parameters  completely  specified) 

In  this  case  E is  empty  so  that  Y2  as 

defined  in  case  2 don’t  exist.  One  has  that 

3 3 - cufl 

and  the  inequality  constraints  can  be  written  as 


C2lCll  fi  > fs 


which  determine  whether  the  constraints  can  be  binding  together  or  not. 


_1 

Letting  R = X2,  y = and  z = y - X^C^i-  the  result 


follows- 
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5-  The  Prior 

The  likelihood  of  Theorem  1 suggests  a prior  of  the 
mixed  form.  Let  p(s)  denote  the  a priori  probability  of  the 
set  of  constraints  s being  non-binding.  Since  p(y,t|s)  determines 
p(3jt|s),  it  suffices  to  specify  p(y,t|s),  which  we  assume  to  be 
the  natural  conjugate  prior  suggested  by 

Theorem  1.  Probabilities  of  events  and  distributions  of  parameters 
can  be  calculated  in  the  usual  way.  The  following  theorem  evaluates  the 
constant  of  integration  for  the  natural  conjugate  prior.  This  constant 
is  necessary  to  calculate  the  Bayes  factor  for  determining  posterior 
odds  of  the  different  sets  of  binding  constraints. 


I 
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_1  c±  min{a2,k.}  = 0 

c = - 

c1P(n)  otherwise 

where 

= r(u/2)(27r)a/2/|  V|1/2(ub/2)u/2 

and  P(n),  with  Q as  in  (4.2),  is  computed  using  the  distribution 
Y2  - Ta^(a2>  V22.1//b,u) ' That  Y2  Is  an  a2  dimensional 

multivariate  t random  variable  with  mean  a2,  precision  V22  1/b, 
and  u degrees  of  freedom  [4,  pp.  59-61].  The  dimension  of  the 
multivariate  t random  variable  often  is  omitted  in  the  sequel. 

Proof : If  min{a2,k}  = 0,  the  parameter  space  is  unrestricted  so 

that  the  prior  is  the  standard  normal/gamma  and  the  constant  is  easily- 
evaluated.  For  the  case  min{a2,k}  > 0,  integrating  the  unrestricted 
parameters  y^  one  obtains 

—1  a-i  /2  i p <•  (v+a0)/2 

0 * (&)  |vll!  Jr  exP{-r((Y2-a2),V22il(Y2-a2)+l>v)/2) 

I(fJ)dT  dy2 

where  v22#]_  = V22  - V21 V11 V12*  Integrating  out  t and  recognizing 
the  multivariate  t form,  one  has 

, (a,+a0)/2  , „ 

c“  - (2ir)  1 2 r(v/2)(|Vi:L||V22#1|)"«(Vb/2)-v/2  Jf(y2|a2,V22>1/b,v)dy2 

n 

where  f (y2|  a2,V22>1A,  v)  is  the  density  of  y2  where  Y2-Tq  U2,V22>3/b,v). 

2 

Thus  c * c^PfQ)  where  P(n)  is  computed  as  in  the  statement  of  the 


Theorem. 
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The  following  corollary  shows  that  P(  n)  of  Theorem  2 can  be 
computed  (if  r<pQ)  using  the  c.d.f.  of  the  multivariate  t dis- 
tribution with  mean  vector  equal  to  0 and  diagonal  elements  of  the 
precision  matrix  equal  to  1.  Approximations  to  the  c.d.f.  in  this 
case  have  been  given  by  John  [7].  In  the  case  r>pQ,  DY2  has  a 
degenerate  multivariate  t distribution  so  it  is  tore  convenient  to 
compute  P(fi)  using  the  a2  dimensional  distribution  of  y2  (rather 
than  the  distribution  of  DY2). 

Corollary  2.1 

If  r<P0>  p(fi)  " F(-w(-Da2,  (DV22^1D/ where  F(«  |a,F,v) 
is  the  c.d.f.  of  a T(a,P,v)  distribution.  Furthermore,  if  the  pre- 
cision matrix  2 = (l*V22  1 D'^Vb  = (c^. ) is  written  as  a a a'  with 
A = (o|^  6j_j)  where  is  the  Kronecker  delta  and 

A = (aij/(aiiajj)*)  then 

P(n)  = F(A(Da2-w) | 0,A,v). 

-1  / -1 

Proof;  If  k<a2  = pQ-r  + k,  then  Dy2  - Tk (Da2»  (DV22. 1D  ) A>#v) 

(see,  for  example,  [13,  Theorem  6.2.1]).  Thus,  if  pQ>r, 

P(fl)  = F(-w|-Da2,(DV"^1D/)"1/b,v).  Since  AD(Y2-a2)  ~ T(0,A,v), 

P(n)  = F(A(Da2-w)  I 0,  A,v) . 

As  stated  above  in  the  case  r>pQ,  it  is  more  convenient  to 
choose  a different  transformation  f rom  t hat  given  in  Corollary  2.1. 

If  the  precision  matrix  2 = V22  is  factored  as  in  Corollary  2.1 

(=  A a a'),  then  u = a (Y2  - a2)  ~-T(0,a,v).  Under  this  transformation 
the  region  0 = (Da  ^u>w-Da2).  Thus,  the  inequalities  on  u are 
linear  so  that  the  region  is  a polygon  or  an  infinite  region  with 
planer  boundaries.  John  [8,9]  has  given  methods  for  approximating 
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the  probability  P(uefi)  where  u has  the  above  distribution. 

In  the  case  minfa^k)  = 0 the  prior  of  Theorem  2 reduces 
to  the  standard  normal/gamma.  In  the  restricted  case  the  distribution 
of  the  parameters  is  more  complicated  and  is  given  in  the  following 
corollary. 

Corollary  2.2 

If  min{a2,k}  > 0,  conditional  distributions  of  the  parameters 
Y^  and  y2  are 

(i)  Y2|s  ~ Tq  (a2,V22<1AiJv)I(n) , a multivariate  t distribution, 
with  the  parameter  as  given,  truncated  to  the  space  (4.2). 

(ii)  The  p.d.f.  p(Y1|s)  of  Y-^  given  s is 
p(y^|s)  = ^(Y]_ I a]_»  ^ii. 2/u*v)  s(y-^) 

where  f is  the  density  of  the  multivariate  t and 

g(Y1)  = P(DY2>w)  where  y2  - T(ag>1,  (v-K^JVgg/fbv  + (Yj-a^1 

V11.2(Yl”al))#  V + al>  and  a2. 1 = a2  - V22V21  ^l"al)* 

Proof:  (i)  was  shown  in  proving  Theorem  2. 

(ii)  Ey  integrating  out  r one  has 

p(y| s )«( (y-a) 'V(Y-a)  +vb)“(v+a)/2  I(n). 

Using  the  Identity 

(Y-a)  V(Y-a)  = (Y1-a1)  vii.2^Yl_al^ + ^Y2"a2.1^  V22^Y2“a2.1^ 
one  has 

p(y1I»)-((v1-«1)'v11i2(y1-«1)+  vb)‘(v+°l )/2 


•wanp 


r -a/2 

J(vb+(V1-a1)'v11>2(Yl-a1)) 

n 


(1  + 


^V2"a2.l)  V22^Y2~a2.1^ 
vb+  (Y1-a1)/ V11<2(Y1-a1) 
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-(v+a)/2 


) 


(5- 


The  first  term  on  the  right  hand  side  of  (5*2)  is  proportional  to 
f(Yllai'Vn  2/*b,v^  311(1  the  second  is  Proportional  to  s ( ) . 

It  is  instructive  to  compare  these  distributions  with  the  ones 
obtained  in  the  unrestricted  case  (i.e.,  1(0)  =1  in  (5«1))«  In  the 
unrestricted  case  the  distribution  of  y2|s  is  T (a2,V22 
which  is  the  same  as  in  (i)  modulo  the  restriction.  Similarly,  the 
density  of  y-Js  in  the  restricted  problem  equals  its  density  in  the  un- 
restricted problem  times  the  function  g(Y-^)* 

Corollary  2.2  shows  that  picking  the  constants  (a,V,v,b)  of  the 
prior  (5.1)  is  considerably  more  complicated  than  picking  the  constants 
in  the  unrestricted  case.  The  following  simpler  prior  is  useful  when 
prior  knowledge  about  the  parameters  (given  the  constraints)  is  vague. 

Box  and  Tiao  [2]  use  similar  priors  as  reference  priors.  Of  course,  in 
repeated  experiments  the  posterior  of  the  first  experiment  can  be  used 
as  a prior  distribution  for  the  second.  Since  the  prior  (5*1)  is  con- 
jugate for  the  likelihood  (4.1),  it  is  useful  in  analysis  of  repeated 
experiments. 


Corollary  2.3 

Let  0 be  as  in  Theorem  2 and  assume  that  the  Lebesque  measure 
m(n)  < 00.  For  a prior  density  of  the  form 

( v+ou  ) /2— 1 

p(y,t  I s)  = cqt  exp{-T(bv  + (v1“a1)/V11(Y1-a1))/2}  I (0)  (5*3) 


then  one  has 


min {k , a2 ] > 1 
min{k,a2)  = 0 
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with  c2 


ctj/2 


(2tt)  x r(v/2)/|V1]L 


1 1/2 


(bv/2)V/2. 


Proof:  The  integral  of  (5*3)  is  easily  evaluated  since  (y^t),  which 

has  the  standard  no rmal /gamma  form,  is  independent  of  y2« 

Note  that  the  constants  of  the  prior  (5*3),  unlike  (5*1)  can  easily 
be  picked  since 

t| S ~ Pv/2,bv/2 


and 


7^  ( a.-|  , V.,  ^ /b , v ) • 


i»  ir 


The  prior  (5*3)  can  be  obtained  from  the  more  general  (5*1)  by 
setting  the  elements  of  V]_2>V22'  311(1  a2  eclual  to  changing  a to 
and  modifying  the  constant  of  integration.  Similarly,  the  posterior 
distribution  using  prior  (5.3)  can  be  obtained  from  the  posterior  using 
prior  (5.1)  by  the  same  specification.  This  will  be  utilized  in  the 
next  section. 


6.  The  Posterior 

Since  the  conjugate  prior  distribution  was  assumed  in  section  5>  the 
posterior  distribution  is  of  the  same  form  as  the  prior.  Thus,  distri- 
bution results  obtained  for  the  prior  distribution  (i.e.,  Corollary  2.2) 
are  valid  for  the  posterior  distribution  with  modified  constants. 

Theorem  3 gives  the  rule  for  updating  the  constants  in  going  from  the 
prior  to  the  posterior  distribution. 

Theorem  5.  Using  the  likelihood  (4.1)  and  the  prior  (5*1)  one  obtains 
a mixed  posterior  with 


p(y> T ly, s ) oer^v+a^2-1  exp(T(  (Y-a) 'V(Y-a)  + vb)/2)  I(n) 


(6.1) 
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with  constants  v = v + n,  V = V+R,R,  a = V-1(Va  + R'z) , = (vb+Q)/v 

where 

Q = ( z-Ry ) ' ( z-Ry ) + (a-a)'V(a-a)  + (^-a) ,R,R(?-a)  and  Y is  the  OLS 
estimate  (R'Rj^R'z. 

Proof:  Follows  by  combining  the  likelihood  with  the  prior  and  completing 

the  square  in  the  exponent  (see  [l6,  p.308]  for  a calculation  of  Q) . 

Using  prior  (5*3)  the  posterior  density  p(Y,x|s,y)  is  of  the 

form  (6.1)  but  with  the  following  modifications  in  the  constants. 

Vll  0 

Let  V = (qXX  q)  and  &>  = then  V, a,F,  and  Q are  as  given 

in  Theorem  3 and  v = n+  v-  a.^. 

The  following  theorem  gives  (up  to  a multiplicative  constant)  the 
posterior  probability  of  the  binding  constraints.  Zellner  [15,  section 
10.4]  gives  a similar  analysis  for  unconstrained  linear  regression. 

Theorem  4.  Using  likelihood  (4.1)  and  the  prior  (5.1)  the  posterior 
probability  p(s|y)  of  constraints  s being  non-binding  is  given  by 
p(s|y)  «Cj(s)  p(s)  where 

(c,  P(0)/P(n)  min[a2,k]  > 1 

c,  = c,(s)  = j • 

^ ^ min(a2,k]  = 0 

P(fl)  is  computed  as  in  Theorem  2,  P (0)  is  the  probability  of  the  same 
event  using  the  distribution  Y2  ~ T(a2,V22  where  the  constants 

are  given  in  Theorem  3,  and 

— i 1/2 

c4  = r(v/2)(vb)v/2|Y|1/2/(r(v/2)(vb)v/2|v|  ) 
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Proof:  For  min{a2,k}  > 1 one  has 

p(s,y  > t | y ) ® cp(s)  t (V+G£)/2'1  exp(-T (bv+(Y-lT)  'V( y-a) ) /2}I(a) 

where  c = c(s)  is  given  in  Theorem  2.  Using  the  method  of  Theorem  2 
to  integrate  out  (y,t)  one  obtains 

p(s  1 y)  <x  p(s)  r(v/2)(2-)a/2?(a)/(l"?|1/2(vb72)v/2  CjPfn)) 

• c^pfnJ/pfn). 

A similar  argument  gives  the  result  in  the  case  min{a2,k}  = 0. 

It  follows  easily  from  Theorem  4 that  the  Bayes  factor  of  model  s^ 
to  model  s,  is  given  by  c^(s^)/c^(s^) . The  rest  of  this  section 
deals  with  the  case  when  the  prior  information  is  vague.  Corollary  4.1 
specializes  the  result  of  Theorem  4 to  the  case  when  using  prior  (5*3) • 

Corollary  4.1.  Using  likelihood  (4.1)  and  the  prior  (5*3), 

P(s|y)  eeC3(s)p(s) 

( Cj,  F(n)/m(n)  min{k,a?}  > 1 

with  c,  = ) * ” 

* min{k,a2)  = 0 

C4  • r(v/2)|vi:Ll1/2(vb)v/8  wa2/2/(r(v/2)|7|1/2(v5)"/2) 


with 
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with  v,V,b  as  defined  following  Theorem  3 and  P(Q)  as  in  Theorem  4 
with  modified  parameters. 

Proof:  For  min{a2,k}  > 1 one  has 

p(v,T,s|y)  aec0T^v+a^//2"1  exp{-T(E‘  + ( Y-a ) '■ V ( Y-a ) )/2}I(n) 

with  v’jVjB’  as  above  and  cQ  given  in  Corollary  2.J.  Integration  with 
respect  to  (v,t)  gives  the  result. 

In  many  situations  the  prior  opinion  about  (Y-^.t)  will  not  depend 
on  s.  In  this  case  the  expressions  for  the  posterior  probabilities  of 
the  various  models  are  somewhat  simpler. 

Corollary  4.2.  Using  prior  (5-3)  and  assuming  b,v,  and  are  the 

same  for  all  s,  then  p(s|y)  is  given  as  in  Corollary  4.1  with 

- r(v/2)  °2/2/(]V\1/2m~/2) 

Proof:  The  quantities  Iv^l^2,  (vb)V//2,  and  r(v/2)  are  the  same. 

Before  experimentation,  knowledge  about  and  r may  also  be 

vague.  This  situation  can  be  characterized  by  small  values  of  b,v, 
and  V.^.  We  can  approximate  the  situation  by  letting  all  these  quantities 
converge  to  zero.  By  Scheff6's  Theorem,  the  limiting  values  then  serve 
as  approximate  probabilities  in  the  case  of  vague  prior  information. 

Corollary  4.3.  The  limit  of  c^  (in  corollary  4. 2)  as  v 0,  b -»■  0, 
and  -*  0 (i.e.,  all  coordinates  go  to  0)  is  given  by 

c4  * r((n-a2)/2)  tt^^/Ir'rI1/2  Q(n-a2)/2 


where  Q = (z-Ry) ' (z-R?). 
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Furthermore,  7(  Q)  of  corollary  4.1  converges  to  P(Dy2>w)  where 

a 

Y2  ~ T(y2, (n-a2)V22  ^/Q,  n-a2)  and  ^22  1 is  computed  from  7 = R'R. 

Proof:  Taking  the  limit  of  the  above  expressions,  one  has  ~ ->  n-a2, 

V -*  R'R,  vF  -*  Q,  and  a -*■  y. 

A reasonable  sampling  theoretic  approach  to  model  selection  is 
through  maximum  likelihood.  Although  the  m.l.e.  of  (p,t)  can  be 
computed  using  quadratic  programming,  to  contrast  the  Bayesian  and 
sampling  methods  it  is  instructive  to  calculate  the  m.l.e.  of  (Y,t,s). 
For  fixed  s the  unrestricted  maximum  of  the  logarithm  of  the  likeli- 
hood (4.1)  is  (ignoring  an  additive  constant) 

q(s)  = -n(l  + u Q/n)  /2 

where  Q = (z-RY) ' (z-Ry)  and  v is  the  OLS  estimate.  The  m.l.e.  of  s 
maximizes  q(s)  (i.e.,  minimizes  the  residual  sum  of  squares)  over 
those  s such  that  (Dy2>w).  If  any  of  the  theoretical  constraints 
are  violated  using  the  OLS  estimates  for  the  parameters,  the  model  is 
not  feasible  and  can't  be  selected.  This  is  somewhat  unfortunate  since 
sampling  error  could  lead  to  violation  of  a constraint  if  the  true 
parameter  is  near  the  boundary  of  the  parameter  space. 

The  Bayesian  method  does  not  completely  rule  out  models  which  have 
a constraint  violated  by  the  OLS  estimate.  Gross  violation  of  a con- 
straint does  lead  to  a small  posterior  probability.  For  example, 
using  the  approximate  distribution  of  Corollary  4.5 

P(0)  = F(a(Dy2  - w) |0,a,n-a2) 

where  A and  a are  determined  as  an  in  Corollary  2.1  with  precision 
matrix  (n-a2)V22< 1/Q  from  Corollary  4.3.  If  a component  of  a(DY2~w) 
is  less  than  -3,  the  posterior  probability  of  that  model  will  be  small. 
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Corollary  4.3  shows  that  the  Bayesian  method  of  model  selection 
also  leads  to  picking  models  that  have  small  residual  sums  of  squares 
(i.e.,  small  Q) . 

The  next  section  applies  this  theory  to  a time  series  of  tem- 
peratures of  a chemical  reaction  [2,  series  C]. 

7.  Analysis  of  Chemical  Temperature  Data 

Computation  of  the  posterior  probabilities  of  binding  constraints 
is  given  now  for  226  successive  readings  of  temperatures  of  a chemical 
process  [2,  series  C].  For  this  series  Box  and  Jenkins  tentatively 
identify  the  number  of  differences  to  be  d = 1 or  2 [2,  p.185]. 

O 

The  model  with  d = 2 (even  when  overfit)  is  untenable  based  on  the  x 
goodness  of- fit  test,  but  the  model  with  d = 1 passes  all  the  diag- 
nostic tests  [2,  pp. 292-3]. 

It  is  reasonable  to  restrict  attention  to  AR(2)  models  only  after 
looking  at  the  partial  autocorrelation  function.  Thus,  the  prior  dis- 
tribution is  chosen  in  this  case  after  some  data  analysis.  Dickey  [5] 
advocates  using  an  approximate  prior  after  some  "data  crunching. " 

After  the  identification  stage  of  Bex  and  Jenkins,  one  might  have 
strong  opinions  about  the  parameters.  As  an  operational  (or  reference) 
prior,  we  employ  the  vague  prior  (5*3)*  The  constants  c^(s),  which 
determine  the  Bayes  factors,  are  calculated  from  the  limiting  values  of 
Corollary  4.3*  Table  1 gives  the  value  of  u (p( s | y) /p(s ) ) for  the 
seven  possible  models.  The  logarithm  of  the  Bayes  factor  for  model  s^ 
to  s j is  given  by  u.  (p(B1|y)/p(s1) ) - u (p(s ^ | y)/p(s j ) ) . Thus  the 
Bayes  factors  can  be  computed  by  subtraction  and  then  exponentiation. 
Using  the  Bayes  factor  one  can  easily  calculate  posterior  probabilities 
for  any  choice  of  p(s).  The  values  for  p(s)  ■ 1/7  are  given  in 
Table  1. 

— Table  1 about  here  — 
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The  table  confirms  the  findings  of  Box  and  Jenkins  that  a single 
difference  is  best,  but  a probability  of  .3  is  assigned  to  the  model 
with  no  differences.  The  similarity  in  the  estimates  of  the  para- 
meters for  "the  raodels  corresponding  to  d = 0 and  d = 1 

shows  why  it  is  difficult  to  pick  between  these  models.  Of  course, 
one  might  assign  higher  prior  probability  to  higher  differences  (i.e., 
p((l})>  p(none))  since  this  is  an  uncontrolled  reaction. 
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Table  1 Bayesian  Analysis  of  Temperature  of  Chemical  Reaction 


s 

A 

*1 

A 

02 

A 

03 

Q 

<6*  (p(s|y)/p(s)) 

P(s|y) 

none 

1.8017 

-.8110 

1.28 

x 10“4 

3-8778 

247.937 

.300 

1 

1.8090 

-.8090 

5.53 

x 10'5 

4.0026 

248.752 

.678 

2 

-.0069 

.9931 

-2.16 

x 10~k 

42.9^9 

-18.962 

0 

3 

1-9903 

-1 

-1.79 

x 10"4 

4.3059 

237.863 

0 

1,2 

0 

1 

-7.05 

x 10"2 

43.005 

-8.942 

0 

1,3 

2 

-1 

-2.68 

x 10"3 

4.4384 

245.413 

.022 

2,3 

-2 

-1 

11.91 

14681 

-662.237 

0 
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Abstract 

This  paper  considers  the  general  linear  model  when  the  parameter 
space  is  subject  to  linear  inequalities.  Four  examples  satisfying 
the  assumptions  are  given.  A Bayesian  analysis  of  this  model  is  pre- 
sented using  a natural  conjugate  prior  density  of  the  mixed  type. 

An  analysis  is  given  for  the  case  that  prior  information  about  the 
parameters  is  vague.  The  Bayesian  and  sampling  methods  of  model  selec- 
tion are  compared  in  the  case  when  little  is  known  a priori. 
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